Beg, Borrow, Steal code that you need
R packages live on CRAN and its mirrors. To install an R package:
install.packages('dplyr', repos='http://cran.r-project.org')or
To use a package, or rather, use the functions from the package, you have to load it into R
library(dplyr)We’ll talk about packages later in the semester.
We will concentrate now on what is known as Base R, that is, the functions that are available when R is installed
We will usually load CSV files, since they are the easiest for R. The typical suggestion if you have Excel data is to save the sheet as a CSV and then import it into R.
You can also load Excel files directly using either the
readxlorriopackages
Data is typically in a rectangular format
Characteristic
Tidy data is a particularly amenable format for data analysis.
An example GEO dataset
Lower back pain symptoms dataset on Kaggle.com
Breast Cancer Proteome dataset on Kaggle.com
data_spine <- read.csv('lecture2_data/Dataset_spine.csv')head(data_spine)## Pelvic.incidence Pelvic.tilt Lumbar.lordosis.angle Sacral.slope
## 1 63.02782 22.552586 39.60912 40.47523
## 2 39.05695 10.060991 25.01538 28.99596
## 3 68.83202 22.218482 50.09219 46.61354
## 4 69.29701 24.652878 44.31124 44.64413
## 5 49.71286 9.652075 28.31741 40.06078
## 6 40.25020 13.921907 25.12495 26.32829
## Pelvic.radius Degree.spondylolisthesis Pelvic.slope Direct.tilt
## 1 98.67292 -0.254400 0.7445035 12.5661
## 2 114.40543 4.564259 0.4151857 12.8874
## 3 105.98514 -3.530317 0.4748892 26.8343
## 4 101.86850 11.211523 0.3693453 23.5603
## 5 108.16872 7.918501 0.5433605 35.4940
## 6 130.32787 2.230652 0.7899929 29.3230
## Thoracic.slope Cervical.tilt Sacrum.angle Scoliosis.slope
## 1 14.5386 15.30468 -28.658501 43.5123
## 2 17.5323 16.78486 -25.530607 16.1102
## 3 17.4861 16.65897 -29.031888 19.2221
## 4 12.7074 11.42447 -30.470246 18.8329
## 5 15.9546 8.87237 -16.378376 24.9171
## 6 12.0036 10.40462 -1.512209 9.6548
## Class.attribute
## 1 Abnormal
## 2 Abnormal
## 3 Abnormal
## 4 Abnormal
## 5 Abnormal
## 6 Abnormal
Ignore the first ##; it denotes that this is R output
View(data_spine) # It looks like a matrixstr(data_spine) # Structure of a dataset## 'data.frame': 310 obs. of 13 variables:
## $ Pelvic.incidence : num 63 39.1 68.8 69.3 49.7 ...
## $ Pelvic.tilt : num 22.55 10.06 22.22 24.65 9.65 ...
## $ Lumbar.lordosis.angle : num 39.6 25 50.1 44.3 28.3 ...
## $ Sacral.slope : num 40.5 29 46.6 44.6 40.1 ...
## $ Pelvic.radius : num 98.7 114.4 106 101.9 108.2 ...
## $ Degree.spondylolisthesis: num -0.254 4.564 -3.53 11.212 7.919 ...
## $ Pelvic.slope : num 0.745 0.415 0.475 0.369 0.543 ...
## $ Direct.tilt : num 12.6 12.9 26.8 23.6 35.5 ...
## $ Thoracic.slope : num 14.5 17.5 17.5 12.7 16 ...
## $ Cervical.tilt : num 15.3 16.78 16.66 11.42 8.87 ...
## $ Sacrum.angle : num -28.7 -25.5 -29 -30.5 -16.4 ...
## $ Scoliosis.slope : num 43.5 16.1 19.2 18.8 24.9 ...
## $ Class.attribute : Factor w/ 2 levels "Abnormal","Normal": 1 1 1 1 1 1 1 1 1 1 ...
So this is a data.frame object with 310 observations and 13 variables, of which one is a factor and the rest are numeric
It looks like a list of things
Dataframes are the primary mode of storing datasets in R
They were revolutionary in that they kept heterogeneous data together
They share properties of both a matrix and a list
class(data_spine)## [1] "data.frame"
Technically, a data.frame is a list of vectors (or objects, generally) of the same length
A matrix is a rectangular array of data of the same type
matrix(0, nrow=2, ncol=4)## [,1] [,2] [,3] [,4]
## [1,] 0 0 0 0
## [2,] 0 0 0 0
matrix(letters, nrow=2)## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## [1,] "a" "c" "e" "g" "i" "k" "m" "o" "q" "s" "u" "w" "y"
## [2,] "b" "d" "f" "h" "j" "l" "n" "p" "r" "t" "v" "x" "z"
matrix(letters, nrow=2, byrow=T)## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## [1,] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
## [2,] "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
You can create a matrix from a set of vectors of the same length
x <- c(1,2,3,4)
y <- c(10,20,30,40)Put columns together
cbind(x,y) # Column bind## x y
## [1,] 1 10
## [2,] 2 20
## [3,] 3 30
## [4,] 4 40
You can create a matrix from a set of vectors of the same length
x <- c(1,2,3,4)
y <- c(10,20,30,40)Put rows together
rbind(x,y) # Row bind## [,1] [,2] [,3] [,4]
## x 1 2 3 4
## y 10 20 30 40
example_matrix = rbind(x,y)
example_matrix## [,1] [,2] [,3] [,4]
## x 1 2 3 4
## y 10 20 30 40
example_matrix[1,] # Extracts 1st row## [1] 1 2 3 4
example_matrix[,2] # extracts 2nd column as a vector, prints horizontally## x y
## 2 20
example_matrix[1,4]## x
## 4
example_matrix## [,1] [,2] [,3] [,4]
## x 1 2 3 4
## y 10 20 30 40
nrow(example_matrix) # Number of rows## [1] 2
ncol(example_matrix) # Number of columns## [1] 4
dim(example_matrix) # shortcut for above## [1] 2 4
example_matrix## [,1] [,2] [,3] [,4]
## x 1 2 3 4
## y 10 20 30 40
example_matrix + 5 # Add 5 to each element## [,1] [,2] [,3] [,4]
## x 6 7 8 9
## y 15 25 35 45
example_matrix * 2 # Multiply each element by 2## [,1] [,2] [,3] [,4]
## x 2 4 6 8
## y 20 40 60 80